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Abstract —This paper presents a novel relational database ar¬ 
chitecture aimed to visual objects classification and retrieval. The 
framework is based on the bag-of-featnres image representation 
model combined with the Snpport Vector Machine classification 
and is integrated in a Microsoft SQL Server database. 
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I. Introduction 

Thanks to content-based image retrieval (CBIR) 
H] El El El 131 m I?] im we are able to search for similar 
images and classify them Emni nulla HI- Images can be 
analyzed based on color representation lfT4llfT31lfT6ll . textures 
ifTTHTSllfPlEOl . shape Il^ll22ll23l or edge detectors 
Recently, local invariant features have gained a wide 
popularity ll26l l27l l28l l29l . The most popular local 
keypoint detectors and descriptors are SURF ll30l . SIFT l25l 
or ORB ini. To find similar images to a query image, we 
need to compare all feature descriptors of all images usually 
by some distance measures. Such comparison is enormously 
time consuming and there is ongoing worldwide research to 
speed up the process. Yet, the current state of the art in the 
case of high-dimensional computer vision applications is not 
fully satisfactory. The literature presents countless methods 
and variants utilizing e.g. a voting scheme or histograms of 
clustered keypoints. They are mostly based on some form 
of approximate search. Recently, the bag-of-features (BoF) 
approach l3^ ll33l ll29l ll34l lf35]l has gained in popularity. In the 
BoF method, clustered vectors of image features are collected 
and sorted by the count of occurrence (histograms). All 
individual descriptors or approximations of sets of descriptors 
presented in the histogram form must be compared. Such 
calculations are computationally expensive. Moreover, the 
BoF approach requires to redesign the classifiers when new 
visual classes are added to the system. 

The paper deals with a visual query-by-example problem 
in relational databases. Namely, we developed a system based 
on Microsoft SQL Server which is able to classify a sample 
image or to return similar images to this image. Storing huge 
amount of undefined and unstructured binary data and its fast 
and efficient searching and retrieval is the main challenge 
for database designers. Examples of such data are images, 
video files etc. Users of world most popular relational database 
management systems (RDBMS) such as Oracle, MS SQL 
Server and IBM DB2 Server are not encouraged to store such 
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data directly in the database files. The example of such an 
approach can be Microsoft SQL Server where binary data is 
stored outside the RDBMS and only the information about 
the data location is stored in the database tables. MS SQL 
Server utilizes a special field type called FileStream which 
integrates SQL Server database engine with NTFS file system 
by storing binary large object (BLOB) data as files in the 
file system. Microsoft SQL dialect (Transact-SQL) statements 
can insert, update, query, search, and back up FileStream 
data. Application Programming Interface provides streaming 
access to the data. FileStream uses operating system cache for 
caching file data. This helps to reduce any negative effects 
that FileStream data might have on the RDBMS performance. 
FileStream data type is stored as a varbinary (max) column 
with pointer to actual data which are stored as BLOBs in the 
NTFS file system. By setting the FileStream attribute on a 
column and consequently storing BLOB data in the file system, 
we achieve the following advantages: 

• performance is the same as the NTFS file system and 
SQL Server cache is not burden with the Filestream 
data, 

• Standard SQL statements such as SELECT, INSERT, 
UPDATE, and DELETE work with FileStream data; 
however, associated files can be treated as standard 
NTFS files. 

In the proposed system, large image files are stored in a 
FileStream field. Unfortunately, despite using this technique, 
there does not exist a technology for fast and efficient retrieval 
of images based on their content in existing relational database 
management systems. Standard SQL language does not contain 
commands for handling multimedia, large text objects, and 
spatial data. 

We designed a special type of field, in which a set of 
keypoints can be stored in an optimal way, as so-called User- 
Defined Type (UDT). Along with defining the new type of 
field, it is necessary to implement methods to compare its 
content. When designing UDT, various features must be also 
implemented, depending on implementing the UDT as a class 
or a structure, as well as on the format and serialization 
options. This could be done using one of the supported .NET 
Framework programming languages and the UDT can be 
implemented as a dynamic-link library (DLL), loaded in MS 
SQL Server. Another major challenge is to create a special 
database indexing algorithm, which would significantly speed 
up answering to SQL queries for data based on the newly de- 


fined field. As aforementioned, standard SQL does not contain 
commands for handling multimedia, large text objects and spa¬ 
tial data. Thus, communities that create software for processing 
such specific data types, began to draw up SQL extensions, 
but they transpired to be incompatible with each other. That 
problem caused abandoning new task-specific extensions of 
SQL and a new concept won, based on libraries of object 
types SQL99 intended for processing specific data applications. 
The new standard, known as SQL/MM (full name; SQL 
Multimedia and Application Packages), was based on objects, 
thus programming library functionality is naturally available 
in SQL queries by calling library methods. SQL/MM consists 
of several parts: framework - library for general purposes, 
full text - defines data types for storing and searching large 
amount of text, spatial - for processing geospatial data, still 
image - defines types for processing images and data mining - 
data exploration. There are also attempts to create some SQL 
extensions using fuzzy logic for building flexible queries. In 
iMI possibilities of creating flexible queries and queries based 
on users examples are presented. It should be emphasized that 
the literature shows little efforts of creating a general way of 
querying multimedia data. 

The main contribution and novelty of the paper is as 
follows: 

• We present a novel system for content-based im¬ 
age classification built in a Microsoft SQL Server 
database, 

• We created a special database indexing algorithm, 
which will significantly speed up answering to visual 
query-by-example SQL queries in relational databases. 

The paper is organized as follows. Section |I^ describes 
the proposed database system. Section [HI] provides simulation 
results on the the PASCAL Visual Object Classes (VOC) 2012 
dataset ISl. 

11. System Architecture and Relational 
Database Structure 

Our system and generally BoF can work with various 
image features. In the paper we use SIFT features as an 
example. To calculate SIFT keypoints we used the OpenCV 
library. We did not use functions from this library as a user 
defined functions (UDF) directly in the database environment 
because: 

1) User Defined Functions can be written only in the 
same .NET framework version as the MS SQL Server 
(e.g. MS SQL Server was created based on .NET 4.0) 

2) Calculations used to find image keypoints are very 
complex, thus running such computations directly on 
the database server causes the database engine to 
become unresponsive. 

Erom the above-mentioned reasons, similarly as in the case 
of the Eull Text Search technology, the most time-consuming 
computations are moved to the operating system as background 
system services of WCE (Windows Communication Eounda- 
tion). WCE Data Service works as the REST architecture (Rep¬ 
resentational State Transfer) which was introduced by Roy T. 
Eielding in his PhD thesis lIMl . Thanks to WCE technology, it 




is relatively easy to set the proposed solution in the Internet. To 
store image local keypoints in the database, we created a User 
Defined Type (column sift_keypoints in SIFTS table). 
These values are not used in the classification of new query 
images. They are stored in case we need to identify a new class 
of objects in the existing images as having keypoint values, we 
would not have to generate keypoint descriptors again. Newly 
created type was created in C# language as a CLR class and 
only its serialized form is stored in the database. The database 
stores also Support Vector Machine classifiers parameters in 
the SVMConf igs table. Such an approach allows running any 
time the service with learned parameters. Running the service 
in the operating system will cause reading SVM classifiers 
from the database. The Stats table is for collecting algorithm 
statistics, where the most important numbers are execution 
times for consecutive stages of the algorithm. The Images 
table is for storing membership of images for visual classes. 
Dictionaries table is responsible for storing keypoint 
clusters data, and these cluster parameters are stored in the 
DictionaryData field of UDT type: 


public struct DictionaryData : 

INullable , IB inary S eri alize 

{ 

private bool _null; 
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Fig. 2. Proposed database structure 


public int WordsCount {get; set;} 
public int SingleWordSize {get; set;} 
public double [][] Values {get; set;} 
public override string ToStringO 

} 


The WordsCount variable stores information about the num¬ 
ber of words in the BoF dictionary, SingleWordSize 
variable value depends on the algorithm used to generate image 
keypoint descriptors, and in case of the SIFT algorithm, it 
equals 128. Two-dimensional matrix Values stores infor¬ 
mation regarding cluster centers. The system operates in two 
modes; 

1) learning mode: Image keypoint descriptors are 
clustered to build a bag-of-features dictionary by the 
fc-means algorithm. Cluster parameters are stored in 
DictionaryData variables. Next, image descriptors 
are created for subsequent images. They can be regarded as 
histograms of membership of image local keypoints to words 
from dictionaries. We use SIFTDetector method from the 
Emgu CV (http;//www.emgu.com) library with the following 
signature; ComputeDescriptorsRaw (Image<Gray, 
byte>grayScaleImage, Iinage<Gray, byte> 
mask , VectorOfKeypoint keypoints). Obtained 
descriptors are then stored in the Descriptors table of 


UDT type; 


public 

{ 


struct Descriptor Data ; 

INullable , IBinarySerialize 


// Private member 

private bool _null; 

public int WordsCount {get; set;} 

public double[] Values {get; set;} 


} 


Using records from this table, learning datasets are generated 
for SVM classifiers to recognize various visual classes. These 
classifiers parameters are stored after the training phase in the 
SVMConfigs table. 

2) Classification Mode: In the classification phase, the 
proposed system works fully automatically. After sending an 
image file to the Images_FT table, a service generating 
local interest points is launched. In the proposed approach, 
we use SIFT descriptors. Next, the visual descriptors are 
checked against membership to clusters stored in the database 
in the Dictionaries table and on this base, the histogram 
descriptor is created. To determine membership to a visual 
class we have to use this vector as the input for all SVM 
classifiers obtained in the learning phase. For the classi¬ 
fication purposes, we extended SQL language and defined 
























































































IV. Conclusion 



Fig. 3. Exemplary images from testing dataset 

TABLE I. Numerical simulation results for various BoF 

DICTIONARY SIZE. 


Words: 

40 

50 

80 

100 

130 

150 

Bus 

40% 

50% 

60% 

60% 

70% 

50% 

Cat 

90% 

80% 

50% 

80% 

80% 

80% 

Train 

0% 

0% 

10% 

20% 

10% 

10% 

Result: 

43% 

43% 

40% 

53% 

53% 

47% 


GetClassOf Image () method in C# language and added 
it to the set of User Dehned Functions. The argument of this 
method is the file identifier from the FileTable table. 

Microsoft SQL Server constraints the sum of indexed 
columns to 900 bytes. Therefore, it was not possible to create 
an index on the columns constituting visual descriptors. To 
allow fast image searching of the Descriptors table, we 
created a field comparative_descriptor that stores 
descriptor value hashed by the MD5 algorithm. It allowed 
creating index on this new column, thus the time to hnd 
an image corresponding with the query image was reduced 
substantially. 


III. Numerical Simulations 

We tested the proposed method on three classes of visual 
objects taken from the PASCAL Visual Object Classes (VOC) 
dataset iJTlI . namely; Bus, Cat and Train. We divided these 
three classes of objects into learning and testing examples. The 
testing set consists of 15% images from the whole dataset. 
Before the learning procedure we generated local keypoint 
vectors for all images from the Pascal VOC dataset using the 
SIFT algorithm. All simulations were performed on a Hyper- 
V virtual machine with MS Windows Operating System (8 
GB RAM, Intel Xeon X5650, 2.67 GHz). The testing set only 
contained images that had never been presented to the system 
during learning process. 

The bag-of-features image representation model combined 
with the Support Vector Machine (SVM) classihcation was 
run hve times for various dictionary sizes: 40, 50, 80, 100, 
130 and 150 words. Dictionaries for the BoF were created 
using C++ language, based on the OpenCV Library ll^ . The 
results of the BoF and SVM classihcation on the testing data 
are presented in Table I. The SQL queries responses are nearly 
real-time for even relatively large image datasets. 


We presented a method that allows integrating relatively 
fast content-based image classihcation algorithm with rela¬ 
tional database management system. Namely, we used bag of 
features. Support Vector Machine classihers and special Mi¬ 
crosoft SQL Server features, such as User Dehned Types and 
CLR methods, to classify and retrieve visual data. Moreover, 
we created indexes to search for the same query image in large 
sets of visual records. Described framework allows automatic 
searching and retrieving images on the base of their content 
using the SQL language. The SQL responses are nearly real¬ 
time on even relatively large image datasets. The system can 
be extended to use different visual features or to have a more 
Hexible SQL querying command set. 
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